14 research outputs found

    On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets

    Get PDF
    Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning models. In this context hundreds of models where proposed with the majority of them trained on public datasets. Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias, hindering the applicability of these models to real-world clinical practice. Considering that datasets are an essential part of model building and evaluation, a deeper understanding of the current landscape is needed. This paper presents an overview of the currently public available COVID-19 chest X-ray datasets. Each dataset is briefly described and potential strength, limitations and interactions between datasets are identified. In particular, some key properties of current datasets that could be potential sources of bias, impairing models trained on them are pointed out. These descriptions are useful for model building on those datasets, to choose the best dataset according the model goal, to take into account the specific limitations to avoid reporting overconfident benchmark results, and to discuss their impact on the generalisation capabilities in a specific clinical settingComment: 12 pages, 3 figure

    Power and Timing Side Channels for PUFs and their Efficient Exploitation

    Get PDF
    We discuss the first power and timing side channels on Strong Physical Unclonable Functions (Strong PUFs) in the literature, and describe their efficient exploitation via adapted machine learning (ML) techniques. Our method is illustrated by the example of the two currently most secure (CCS 2010, IEEE T-IFS 2013) electrical Strong PUFs, so-called XOR Arbiter PUFs and Lightweight PUFs. It allows us for the first time to tackle these two architectures with a polynomial attack complexity. In greater detail, our power and timing side channels provide information on the single outputs of the many parallel Arbiter PUFs inside an XOR Arbiter PUF or Lightweight PUF. They indicate how many of these single outputs (in sum) were equal to one (and how many were equal to zero) before the outputs entered the final XOR gate. Taken for itself, this side channel information is of little value, since it does not tell which of the single outputs were zero or one, respectively. But we show that if combined with suitably adapted machine learning techniques, it allows very efficient attacks on the two above PUFs, i.e., attacks that merely use linearly many challenge-response pairs and lowdegree polynomial computation times. Without countermeasures, the two PUFs can hence no longer be called secure, regardless of their sizes. For comparison, the best-performing pure modeling attacks on the above two PUFs are known to have an exponential complexity (CCS 2010, IEEE T-IFS 2013). The practical viability of new our attacks is firstly demonstrated by ML experiments on numerically simulated CRPs. We thereby confirm attacks on the two above PUFs for up to 16 XORs and challenge bitlengths of up to 512. Secondly, we execute a full experimental proof-of-concept for our timing side channel, successfully attacking FPGA-implementations of the two above PUF types for 8, 12, and 16 XORs, and bitlengths 64, 128, 256 and 512. In earlier works (CCS 2010, IEEE T-IFS 2013), 8 XOR architectures with bitlength 512 had been explicitly suggested as secure and beyond the reach of foreseeable attacks. Besides the abovementioned new power and timing side channels, two other central innovations of our paper are our tailormade, polynomial ML-algorithm that integrates the side channel information, and the implementation of Arbiter PUF variants with up to 16 XORs and bitlength 512 in silicon. To our knowledge, such sizes have never been implemented before in the literature. Finally, we discuss efficient countermeasures against our power and timing side channels. They could and should be used to secure future Arbiter PUF generations against the latter

    Topographische Kodierungsprinzipien in olfaktorischen Systemen

    Get PDF
    Topographic representation of stimuli features along the neural sheet is a commonly observed paradigm in sensory coding. Although there is growing evidence of such a topographic arrangement in olfactory systems, no definite functional topographies have yet been established. To this end this thesis contributes towards establishing to­ pographic coding principles in olfactory systems. This thesis investigates both functional topography in the olfactory relay centre of mice, the olfactory bulb, as well as in a secondary olfactory centre of drosophila, the lateral horn. Thereby it provides additional evidence to the hypothesis that receptive fields in the olfactory bulb are spatially grouped according to their response spectra overlap. Furthermore it shows that a topographic readout of the olfactory relay centre in drosophila, the antennal lobe, yields local response areas in the lateral horn associated with innate valence. All in all this thesis emphasizes the functional role of topography in olfactory systems. Within this biological question two computational methods are introduced and re­ fined that assist olfactory research. First regularized non­negative matrix factoriza­ tion is introduced as a tool to automatically disaggregate functional imaging measure­ ments into response domains. And second quantitative structure­activation relation­ ship (QSAR) models are employed to obtain a quantitative physico­chemical descrip­ tion of olfactory receptive fields.Die topographische Repräsentation von Stimulusfeature in sensorischen Arealen ist ein generelles Prinzip neuronaler Codierung. Auch für olfaktorische Systeme gibt es zahlreiche Hinweise auf eine topographische Organisation. Jedoch hat sich bis jetzt noch keine endgültiges Prinzip einer funktionellen Topographie herauskristallisiert. Die vorliegende Arbeit trägt zu einem erweiterten Verständnis topographischer Prin­ zipien in olfaktorischen Systemen bei. Zum einen wird in dieser Arbeit die Topogra­ phie des ersten olfaktorischen Verschaltungszentrums von Mäusen, des olfaktorischen Bulbus, untersucht, zum anderen jene von einem sekundären olfaktorischen Zentrum in Drosophila, des Lateralen Horns. Dabei stützt diese Arbeit insbesondere die Hypo­ these, dass räumlich benachbarte rezeptive Felder im olfaktorischen Bulbus eine Überschneidung in ihren Antwortspektren aufweisen. Im Weiteren wird auch gezeigt, dass ein topographisches Auslesen des Antennallobus, des olfaktorischen Verschal­ tungszentrum in Drosophila, zu lokalen, mit Verhalten assozierten Antwortarealen im Lateralen Horn führt. Insgesamt hebt diese Arbeit die funktionelle Bedeutung einer topographischen Organisation in olfaktorischen Systemen hervor. Entlang dieser biologischen Fragestellung werden in der vorliegenden Arbeit zwei al­ gorithmische Methoden eingeführt und optimiert. Dies ist zum einen „regularized non­negative Matrix Factorization“ zur automatischen Extraktion von Antwortregio­ nen in Zeitreihen funktioneller Bildgebung. Und zum anderen sind es Quantitative Struktur­Wirkungs­Beziehungs (QSAR) Modelle, die eine quantitative physika­ lisch­chemische Beschreibung von olfaktorischen rezeptiven Feldern ermöglichen

    On the Foundations of Physical Unclonable Functions

    Get PDF
    We investigate the foundations of Physical Unclonable Functions from several perspectives. Firstly, we discuss formal and conceptual issues in the various current definitions of PUFs. As we argue, they have the effect that many PUF candidates formally meet no existing definition. Next, we present alternative definitions and a new formalism. It avoids asymptotic concepts like polynomial time, but is based on concrete time bounds and on the concept of a security experiment. The formalism splits the notion of a PUF into two new notions, Strong t-PUFs and Obfuscating t-PUFs. Then, we provide a comparative analysis between the existing definitions and our new notions, by classifying existing PUF implementations with respect to them. In this process, we use several new and unpublished machine learning results. The outcome of this comparative classification is that our definitions seem to match the current PUF landscape well, perhaps better than previous definitions. Finally, we analyze the security and practicality features of Strong and Obfuscating t-PUFs in concrete applications, obtaining further justification for the split into two notions

    On the Composition and Limitations of Publicly Available COVID-19 X-Ray Imaging Datasets

    Get PDF
     Machine learning based methods for diagnosis and progression prediction of COVID-19 from imaging data have gained significant attention in the last months, in particular by the use of deep learning models. In this context hundreds of models where proposed with the majority of them trained on public datasets. Data scarcity, mismatch between training and target population, group imbalance, and lack of documentation are important sources of bias, hindering the applicability of these models to real-world clinical practice. Considering that datasets are an essential part of model building and evaluation, a deeper understanding of the current landscape is needed. This paper presents an overview of the currently public available COVID-19 chest X-ray datasets. Each dataset is briefly described and potential strength, limitations and interactions between datasets are identified. In particular, some key properties of current datasets that could be potential sources of bias, impairing models trained on them are pointed out. These descriptions are useful for model building on those datasets, to choose the best dataset according the model goal, to take into account the specific limitations to avoid reporting overconfident benchmark results, and to discuss their impact on the generalisation capabilities in a specific clinical setting.

    Public Covid-19 X-ray datasets and their impact on model bias - a systematic review of a significant problem

    Get PDF
    Computer-aided diagnosis and stratification of COVID-19 based on chest X-ray suffers from weak bias assessment and limited quality-control. Undetected bias induced by inappropriate use of datasets, and improper consideration of confounders prevents the translation of prediction models into clinical practice. By adopting established tools for model evaluation to the task of evaluating datasets, this study provides a systematic appraisal of publicly available COVID-19 chest X-ray datasets, determining their potential use and evaluating potential sources of bias. Only 9 out of more than a hundred identified datasets met at least the criteria for proper assessment of the risk of bias and could be analysed in detail. Remarkably most of the datasets utilised in 201 papers published in peer-reviewed journals, are not among these 9 datasets, thus leading to models with a high risk of bias. This raises concerns about the suitability of such models for clinical use. This systematic review highlights the limited description of datasets employed for modelling and aids researchers to select the most suitable datasets for their task

    Model bias and its impact on computer-aided diagnosis: A data-centric approach

    No full text
    Machine learning and data-driven solutions open exciting opportunities in many disciplines including healthcare. The recent transition to this technology into real clinical settings brings new challenges. Such problems derive from several factors, including their dataset origin, composition and description, hampering their fairness and secure application. Considering the potential impact of incorrect predictions in applied-ML healthcare research is urgent. Undetected bias induced by inappropriate use of datasets and improper consideration of confounders prevents the translation of prediction models into clinical practice. Therefore, in this work, the use of available systematic tools to assess the risk of bias in models is employed as the first step to explore robust solutions for better dataset choice, dataset merge and design of the training and validation step during the ML development pipeline

    Efficient power and timing side channels for physical unclonable functions

    No full text
    Abstract. One part of the original PUF promise was their improved resilience against physical attack methods, such as cloning, invasive techniques, and arguably also side channels. In recent years, however, a number of effective physical attacks on PUFs have been developed Our strategy is demonstrated in silicon on FPGAs, where we attack the above two architectures for up to 16 XORs and 512 bits. For comparison, in earlier works XOR-based Arbiter PUF designs with only up to 5 or 6 XORs and 64 or 128 bits had been tackled successfully. Designs with 8 XORs and 512 bits had been explicitly recommended as secure for practical use Together with recent modeling attack

    Leveraging state-of-the-art architectures by enriching training information - a case study

    Get PDF
    Our working hypothesis is that key factors in COVID-19 imaging are the available imaging data and their label noise and confounders, rather than network architectures per se. Thus, we applied existing state-of-the-art convolution neural network frameworks based on the U-Net architecture, namely nnU-Net [3], and focused on leveraging the available training data. We did not apply any pre-training nor modi ed the network architecture. First, we enriched training information by generating two additional labels for lung and body area. Lung labels were created with a public available lung segmentation network and weak body labels were generated by thresholding. Subsequently, we trained three di erent multi-class networks: 2-label (original background and lesion labels), 3-label (additional lung label) and 4-label (additional lung and body label). The 3-label obtained the best single network performance in internal cross-validation (Dice-Score 0.756) and on the leaderboard (Dice- Score 0.755, Haussdor 95-Score 57.5). To improve robustness, we created a weighted ensemble of all three models, with calibrated weights to optimise the ranking in Dice-Score. This ensemble achieved a slight performance gain in internal cross-validation (Dice-Score 0.760). On the validation set leaderboard, it improved our Dice-Score to 0.768 and Haussdor 95- Score to 54.8. It ranked 3rd in phase I according to mean Dice-Score. Adding unlabelled data from the public TCIA dataset in a student-teacher manner signi cantly improved our internal validation score (Dice-Score of 0.770). However, we noticed partial overlap between our additional training data (although not human-labelled) and  nal test data and therefore submitted the ensemble without additional data, to yield realistic assessments
    corecore